Goto

Collaborating Authors

 accelerated gradient clipping


Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

Neural Information Processing Systems

In this paper, we propose a new accelerated stochastic first-order method called clipped-SSTM for smooth convex stochastic optimization with heavy-tailed distributed noise in stochastic gradients and derive the first high-probability complexity bounds for this method closing the gap in the theory of stochastic optimization with heavy-tailed noise. Our method is based on a special variant of accelerated Stochastic Gradient Descent (SGD) and clipping of stochastic gradients. We extend our method to the strongly convex case and prove new complexity bounds that outperform state-of-the-art results in this case. Finally, we extend our proof technique and derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.


Review for NeurIPS paper: Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

Neural Information Processing Systems

Weaknesses: * In [71] there are several theoretical guarantees both for convex and non-convex cases. I am wondering why they are not mentioned in Table 2. On the other hand, their analysis also covers the case where the domain doesn't need to be compact. Doesn't this reduce the novelty of this paper? I am willing to increase my grade if this concern is addressed. It would be interesting to see a comparison between the results in this paper and theirs.



Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

Neural Information Processing Systems

In this paper, we propose a new accelerated stochastic first-order method called clipped-SSTM for smooth convex stochastic optimization with heavy-tailed distributed noise in stochastic gradients and derive the first high-probability complexity bounds for this method closing the gap in the theory of stochastic optimization with heavy-tailed noise. Our method is based on a special variant of accelerated Stochastic Gradient Descent (SGD) and clipping of stochastic gradients. We extend our method to the strongly convex case and prove new complexity bounds that outperform state-of-the-art results in this case. Finally, we extend our proof technique and derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.